Direct Policy Search and Uncertain Policy Evaluation
نویسنده
چکیده
Reinforcement learning based on direct search in policy space requires few assumptions about the environment. Hence it is applicable in certain situations where most traditional reinforcement learning algorithms based on dynamic programming are not, especially in partially observable, deterministic worlds. In realistic settings, however, reliable policy evaluations are complicated by numerous sources of uncertainty, such as stochasticity in policy and environment. Given a limited life-time, how much time should a direct policy searcher spend on policy evaluations to obtain reliable statistics? Despite the fundamental nature of this question it has not received much attention yet. Our efficient approach based on the success-story algorithm (SSA) is radical in the sense that it never stops evaluating any previous policy modification except those it undoes for lack of empirical evidence that they have contributed to lifelong reward accelerations. Here we identify SSA’s fundamental advantages over traditional direct policy search (such as stochastic hill-climbing) on problems involving several sources of stochasticity and uncertaint),.
منابع مشابه
Sequential Classification-Based Optimization for Direct Policy Search
Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classificationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efficient for non-convex optimization...
متن کاملThe Host Iranian Economy and Foreign Direct Investment: A Comparative Analysis
In this paper fourteen selected developing economies, including Iran, are compared to evaluate the Iranian position in attracting foreign direct investment (FDI). The evaluation is based on economic performance, risk, liberalization policy, and FDI determinant indicators. The results show that the Iranian economy has a sound economic performance and its economic, financial, and political risks ...
متن کاملEmployment, Wages and Optimal Monetary Policy
We study optimal monetary policy when the empirical evidence leaves the policymaker uncertain whether the true data-generating process is given by a model with sticky wages or a model with search and matching frictions in the labor market. Unless the policymaker is almost certain about the search and matching model being the correct data-generating process, the policymaker chooses to stabilize ...
متن کاملA Review of Evidence-Informed Policy Making in Sustainable Healthy Food and Nutrition Systems
Background and purpose: A safe food system provides the conditions for consumers to decide about and choose the food products. This systematic review describes the alternatives in order to achieve a healthy nutrition pattern in a food system that can be used to make changes in current policies. Materials and methods: An electronic literature search was done in Google Scholar, Web of Science, P...
متن کاملProviding Value to New Health Technology: The Early Contribution of Entrepreneurs, Investors, and Regulatory Agencies
Background New technologies constitute an important cost-driver in healthcare, but the dynamics that lead to their emergence remains poorly understood from a health policy standpoint. The goal of this paper is to clarify how entrepreneurs, investors, and regulatory agencies influence the value of emerging health technologies. Methods Our 5-year qualitative research program examined the proces...
متن کامل